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Simulation  Optimization 


Generically  stated,  simulation  optimization  is  the  problem: 

maxf(x) 

X 


where 

»  f  cannot  be  evaluated  analytically.  It  must  instead  be  evaluated 
through  Monte  Carlo  simulation. 

•  We  also  assume  the  simulation  is  takes  a  long  time  to  run,  so  our 
goal  is  to  do  as  well  as  we  can  with  a  limited  budget  of  function 
evaluations  (computationally  expensive  simulation). 


Stochastic  Root  Finding 


Find  the  root  of  a  monotone  function  whose  values  can  only  be  measured 
with  noise.  Use  as  few  function  evaluations  as  possible. 


This  is  equivalent  to  maximizing  a  concave  function  whose  gradient  values 
are  measured  with  noise. 


Multiple  Comparisons  with  a  Standard 


Identify  those  points  x  whose  value  f(x)  is  above  a  threshold,  when 
function  evaluations  are  noisy.  Use  as  few  function  evaluations  as  possible. 


In  discrete  domains,  this  is  multiple  comparisons  with  a  standard. 
In  continuous  domains,  this  is  finding  level  sets. 


Overview 


«  Simulation  optimization  problems  can  be  formulated  in  a  Bayesian 
decision  theoretic  framework. 

•  When  formulated  in  this  way,  optimal  policies  can  be  characterized 
using  dynamic  programming. 

«  (This  is  well  known,  although  perhaps  not  obvious). 

»  Novel  results: 

a  We  find  the  Bayes-optimal  policy  for  an  idealized  version  of  stochastic 
root-finding  and  apply  this  to  the  real  stochastic  root-finding  problem. 
a  We  find  the  Bayes-optimal  policy  for  sequential  multiple  comparisons 
with  a  standard. 


Outline 


^  General  Decision-Theoretic  Framework 
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Q  Multiple  Comparisons  with  a  Standard 


General  Decision-Theoretic  Framework 


Q  Before  we  begin,  nature  fixes  some  true  function  f. 

<t  e.g.,  f  is  the  function  whose  root  we  are  trying  to  find. 

Q  For  time  n  =  1  up  to  N,  we: 

Q  Choose  some  type  of  measurement  x„  to  perform. 

Q  Observe  some  value  yn  whose  distribution  depends  on  xn  and  f. 

O  After  the  final  measurement,  we  make  some  decision  x*  based  on  the 
data  xi,...,x/v,  yi,...,yN. 

O  We  pay  some  penalty  L(jc*,f). 

Our  goal  is  to  choose  xi, . . .  ,x/y,x*  adaptively  to  minimize  Z_(ic* ,  f). 


General  Decision-Theoretic  Framework 


•  Kn[L(x^,f)  |  f]  is  the  expected  loss  when  we  use  policy  n  and  the 
truth  is  f. 

a  A  policy  n  is  a  method  for  choosing  each  xn+\  as  a  function  of 
xi:n,yi:n,  and  choosing  x,  as  a  function  of  Xi;A/,yi:/v. 

9  We  cannot  solve  inf^E7r[/.(i4,  f)  \  f]  because  the  answer  depends  on 
f,  which  we  do  not  know. 

9  In  this  talk,  we  are  interested  in  average  case  performance. 


General  Decision-Theoretic  Framework 


•  The  average-case  loss  under  a  probability  distribution  Po  is 

J E *[/.(£,  0  |  f}F0(df)=E”[L(x*,f)} 

9  Po  is  our  prior,  and  it  may  represent  our  subjective  belief  about  f. 

It  is  the  measure  under  which  we  evaluate  average-case  performance. 
9  Goal:  minimize  the  average-case  loss 


General  Decision-Theoretic  Framework 


«  Associated  with  the  prior  is  a  sequence  of  posterior  distributions 

Pn(/rG  •)  =  P0{f  G  •  |xi:n,yi:n}. 

»  The  problem  inf^E11  [Z_(x*,  f )]  can  be  viewed  as  a  Markov  decision 
process,  where  the  Markov  process  being  controlled  is  the  sequence 
of  posterior  distributions  (P„  :  n  >  0). 

«  As  a  Markov  decision  process,  its  solution  is  characterized  by  the 

dynamic  programming  equations. 


Dynamic  Programming 


«  To  solve  the  DP  numerically,  we  must  compute  the  value  function  Vn 
for  every  possible  posterior  distribution  P„. 

»  This  is  usually  much  too  large  to  solve  numerically  via  brute  force. 
•  Finding  analytic  solutions  to  dynamic  programs  is  usually  also  difficult. 
«  Approximate  DP  might  be  fruitful,  but  we  do  not  pursue  it  here. 

»  In  this  talk,  dynamic  programming  is  used  as  a  theoretical  tool. 

»  In  stochastic  root-finding  we  solve  the  DP  analytically. 

»  In  multiple  comparisons  with  a  standard,  we  use  analytic  techniques  to 
decompose  the  original  DP  into  a  collection  of  easy-to-solve  problems. 
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Q  Multiple  Comparisons  with  a  Standard 


Stochastic  Root-Finding 


Stochastic  Root-Finding 


9  f  :  M  i— >  M  is  a  decreasing  function. 

«  The  only  way  to  evaluate  f  is  via  stochastic  simulation. 

«  We  observe  yn  =  f(xn)  +  en,  where  en  is  independent  noise. 

9  Our  goal  is  to  find  a  root  x*,  i.e.,  a  point  x*  such  that  f(x*)  =  0. 
9  Central  Question:  Given  a  budget  of  N  measurements,  xi,...,x/y, 
how  should  we  place  them  to  find  x*  as  accurately  as  possible? 


Stochastic  Root-Finding:  Model 


«  Assume  we  observe  only  whether  y„  is  larger  or  smaller  than  0. 


«  Assume  nature  gives  the  incorrect  sign  with  fixed  known  probability  q 


sgn(y„) 


— sgn(f(x„)),  with  probability  q, 
sgn (f(xn)),  with  probability  1  —  q 


•  Ongoing  work  examines  how  we  can  relax  these  assumptions. 


Posterior  Distributions 


»  Place  a  prior  density  po  on  the  root  x*,  e.g.,  uniform  on  [0,1]. 

9  Each  measurement  xn  produces  a  new  posterior  density  p„  on  x*: 

Pn(x)  =  P{X*  E  dx  I  Xi;„_i,yi:„_i} 
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Posterior  Distributions 


Posterior  Density  pn(x)  Posterior  Density  pn(x) 
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Stochastic  Root-Finding 


«  Our  goal  is  to  minimize  the  final  entropy 

H{Pn)  =  ~J  p/v(*)logpA/(x)ofx. 

»  The  value  function  is 


Vn(Pn )  =  infE n[H(pN)  I  Pn\, 

K 

and  the  stochastic  optimization  problem  we  need  to  solve  is  to  find 
the  policy  n  that  attains  the  infimum  defining  Vo(po). 

«  Bellman's  recursion  can  be  written, 

V„(pn)  =  infXn€[0)i] E [Vn+i(pn+i)  |  pn}. 


Main  Result:  Bayes  Optimality 


Theorem 

The  value  function  can  be  written  explicitly  as 

V(Pn)  =  H(pn)  —  (N  —  n)  [  <7  log2 (qr)  —  (1  —  q)  log2(l  —  <7)] , 

and  the  policy  that  chooses  xn  at  the  median  of  pn  is  optimal. 


Example 
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Experimental  Results 


Performance  on  a  problem  with  f(x )  =  exp(x)  —  exp(l/3)  and  domain 
[0,1].  Error  is  |x„  —  x*|  on  one  sample  path.  Stochastic  approximation  uses 
stepsize  1/n. 


Geometric  Convergence 


«  <7  is  fixed. 

«  Updates  are  done  using  q. 

9  Then  xn  converges  geometrically  to  the  root  x*. 


Theorem 

Fix  e  >  0  and  let  Ae  =  [x*  —  £,  x*  +  e] . 

Let  z  =  inf  {n  :  xn  £  AE}. 

Let  c(q)  =  l  +  q\og2{q)  +  (l-q)\og2{l-q).  Then 

E[t]  <  -log(2e)/c(q) 

[This  result  is  from  joint  work  with  Shane  Henderson] 


Ongoing  Work:  Unknown  Non-Constant  q(x) 


In  practice,  the  error  probability  q(x)  is  unknown  and  varies  with  x. 

To  address  this,  we  are  investigating  several  approaches,  including: 

9  Fix  a  value  q  >  1/2  and  an  error  tolerance  £  >  0. 

9  At  each  point  xn,  take  multiple  samples  in  a  sequential  fashion  to 
estimate  q{x)  and  to  construct  a  composite  bit  yn  that  is  correct  with 
at  least  probability  q. 

9  Stop  when  the  estimated  q(x)  is  within  e  of  1/2. 

[Joint  work  with  Shane  Henderson  and  Rolf  Waeber] 
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Multiple  Comparisons  with  a  Standard 


We  now  consider  a  different  problem: 

Given  a  limited  budget  of  function  evaluations,  determine  which 
alternatives  have  value  exceeding  a  threshold. 


[Joint  work  with  Jing  Xie,  PhD  student  at  Cornell  University] 


Problem  Formulation 


»  Given  alternatives  x  =  1, . . . ,  k  and  a  threshold  d  £  M. 

«  Samples  from  alternative  x  are  Normal(/tx, a^). 

®  jiix  is  unknown,  while  ctx  is  assumed  known  (can  be  relaxed). 

9  Our  prior  on  each  fj,x  is  independent  and  normal. 

•  Sampling  continues  until  an  external  deadline  N  requires  it  to  stop. 

9  The  algorithm  assumes  N  is  unknown  and  geometrically  distributed. 

9  When  sampling  stops,  we  estimate  the  level  set  {x :  jUx  >  d}  based  on 
the  samples.  The  reward  is  the  number  of  alternatives  correctly 
classified. 


Example 


Administrators  are  considering  allocations  of  ambulances  across  11  bases 
in  the  city  of  Edmonton.  They  want  to  know  which  allocations  satisfy 
mandated  minimums  for  percentage  of  calls  answered  in  time,  under  a 
variety  of  different  possible  call  arrival  rates. 


_ 


[Thanks  to  Shane  Henderson  and  Matt  Maxwell  for  providing  the  ambulance  simulation] 


Optimal  Solution 


9  The  expected  reward  is  the  expected  number  of  alternatives  correctly 
classified  at  the  end. 

9  We  decompose  this  expected  reward  into  an  infinite  sum  of 
discounted  expected  one-step  rewards 


R0  +  Ex 


£  a"  1R, 

n— 1 


Here, 

9  a  is  the  parameter  of  the  geometric  distribution  governing  N. 

9  R0  is  the  expected  reward  if  we  stop  after  taking  no  samples. 

9  Rn  is  the  expected  one-step  improvement,  due  to  sampling,  of  the 
probability  of  correctly  classifying  alternative  xn. 


Optimal  Solution 


«  Written  in  this  way,  the  problem  becomes  a  multi-armed  bandit. 

«  Results  from  [Gittins  &  Jones  1974]  show  that  the  optimal  solution 


argmaxvx(Snx), 

X 


where  Snx  is  a  parameterization  of  the  marginal  posterior  on  /tx  and 
vx(-)  is  the  Gittins  index. 

«  The  Gittins  index  vx(-)  is  defined  in  terms  of  a  single-alternative 
version  of  the  problem 


Vx(s) 
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«  We  can  compute  Gittins  indices  efficiently  because  the 
single-alternative  problem  is  much  smaller  than  the  full  DP. 


Application  to  Ambulance  Positioning 


500  PE  samples 


500  MV  samples 


500  OPT  samples 


1000  PE  samples 


1000  MV  samples 


1 000  OPT  samples 


PE=pure  exploration  (sample  at  random); 

MV=max  variance  (equal  allocation);  OPT=optimal  policy. 


Conclusion 


«  We  found  the  optimal  policy  for  two  types  of  simulation 

optimization.  Other  problems  where  preliminary  results  suggest  that 
the  DP  can  be  solved: 

»  Finding  level  sets  of  1-dimensional  continuous  functions. 

o  Multidimensional  stochastic  root-finding  and  noisy  convex  optimization. 

»  Having  the  optimal  solutions  allows  us  to  better  understand 
approximate  policies,  e.g.,  knowledge-gradient  (KG)  policies. 

9  The  KG  policy  is  optimal  for  idealized  stochastic  root-finding. 

9  The  KG  policy  is  almost  optimal  for  multiple  comparisons  with  normal 
rewards  and  no  cost  of  sampling,  but  performs  poorly  with  Bernoulli 
rewards  or  a  cost  of  sampling. 

9  This  improved  understanding  will  let  us  use  heuristic  policies  more 
effectively  in  problems  for  which  we  cannot  compute  the  optimal  policy. 

9  This  decision-theoretic  approach  is  a  useful  theoretical  framework 
for  developing  new  and  better  algorithms  for  simulation  optimization. 


Thank  You 


Any  questions? 


